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Abstract 

Users expect fast and fluid response from today’s cloud infrastructure. Large-scale computing 
frameworks such as MapReduce divide jobs into many parallel tasks and execute them on 
different machines to enable faster processing. But the tasks on the slowest machines (straggling 
tasks) become the bottleneck in the completion of the job. One way to combat the variability 
in machine response time, is to add replicas of straggling tasks and wait for one copy to finish. 

In this paper we analyze how task replication strategies can be used to reduce latency, 
and their impact on the cost of computing resources. We use extreme value theory to show 
that the tail of the execution time distribution is the key factor in characterizing the trade¬ 
off between latency and computing cost. From this trade-off we can determine which task 
replication strategies reduce latency, without a large increase in computing cost. We also propose 
a heuristic algorithm to search for the best replication strategies when it is difficult to fit a 
simple distribution to model the empirical behavior of task execution time, and use the proposed 
analysis techniques. Evaluation of the heuristic policies on Google Trace data shows a significant 
latency reduction compared to the replication strategy used in MapReduce. 


1 Introduction 


Applications such as Google search, Dropbox, Netflix need to perform enormous amounts of com¬ 
puting, or data processing on the cloud. Recently, cloud computing is also being offered as a service 
by Amazon S3, Microsoft Azure etc. where users can rent machines by the hour to run their com¬ 
puting jobs. The large-scale sharing of computing resources makes cloud computing flexible and 
scalable. 

Cloud computing frameworks such as MapReduce [8] and Hadoop 18 employ massive paral¬ 


lelization to reduce latency. Large jobs are divided into hundreds of tasks that can be executed 
parallely on different machines. An important class of parallel execution is “embarrasingly parallel” 
computation 22 , which requires little or no effort in dividing the computation into independent 


parallel tasks. Several algorithms used in optimization and machine learning, for example the Alter¬ 
nating Direction Method of Multipliers (ADMM) [4j, and Markov Chain Monte-Carlo (MCMC) j 141, 
fall into this class and can be parallelized easily. 


*Da Wang was affiliated with the Signals, Information and Algorithm Laboratory when this research was con¬ 
ducted. 

^D. Wang and G. Joshi contributed equally to this work. 
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The execution time of a task on a machine is subject to stochastic variations due to co-hosting, 
virtualization and other hardware and network variations 


11 


Thus, the key challenge in executing 
a job with a large number of tasks is the latency in waiting for the slowest tasks, or the “stragglers” 
to finish. As pointed out in 111, Table 1], the latency of executing many parallel tasks could be 


significantly larger (140 ms) than the median latency of a single task (1 ms). 

In this work we analyze how replication of straggling tasks can be used to reduce latency. In 
particular, we develop a mathematical framework and provide insights into how task replication 
affects the trade-off between latency and computing cost. 


1.1 Related prior work 


The idea of replicating tasks in parallel computing has been recognized by system designers [5,10 
and first adopted at a large scale via the “backup tasks” in MapReduce |8j. A line of systems 
work [2, 3.15, 23 further developed this idea to handle various performance variability issues in 


data centers. 

While task replication has been studied in systems literature and also adopted in practice, there 
is not much work on careful mathematical analysis of replication strategies. In |5I] replication 
strategies are analyzed, mainly for the single task case. In this paper we consider task replication 
in a computing job consisting of a large number of tasks, which corresponds closely to today’s large- 
scale cloud computing frameworks. We note that using replication or redundancy to reduce latency 
has also attracted attention in other contexts such as cloud storage and networking applications 112, 
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1.2 Our contributions 

To the best of our knowledge, we establish the first formal analysis of task replication in jobs 
consisting of a large number of tasks, for the system model and relevant performance measures 
proposed. We analyze the trade-off between the latency and the cost of computing resources, and 
provide insights into design of task replication strategies. 

In particular, we study a class of scheduling policies called single-fork policies and characterize 
how latency and resource usage depend on three parameters: when we replicate tasks, how many 
replicas are launched, and whether the original replicas are killed or not. We show that the tail of 
the execution time distribution (heavy, light or exponential tail) is the key factor that determines 
the choice of the task replication policy. In particular for heavy tail distributions e.g. Pareto, we 
identify scenarios where the latency and computing cost can be reduced simultaneously. We also 
propose a heuristic algorithm to find the best single-fork policy when it is hard to use the proposed 
analysis techniques for the empirical distribution of task execution time. 


1.3 Organization 

The rest of the paper is organized as follows. In Section [2] we formulate the problem, define the 
latency and cost metrics and introduce notation used in the paper. In Section [3] we provide a 
summary of the analysis of single-fork task replication policies. Then in Section [4] we describe a 
heuristic algorithm to find a good scheduling policy for execution time distributions for which we 
cannot find an analytical distribution that fits well. Proofs of the analysis are given in Section [5j 
In Section [6] we conclude with a discussion of the implications and future perspectives. Finally, 
results from order statistics and extreme value theory used in this work are given in the Appendix. 
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2 Problem Formulation 


We now describe the system model, and propose the performance metrics used to evaluate a task 
replication strategy. 

2.1 Notation 

First, we define some notation used in this paper. Lower-case letters (e.g., x) denote a particular 
value of the corresponding random variable, which is denoted in upper-case letters (e.g., X). We 
denote the cumulative distribution function (CDF) of X by Fx(x). Its complement, the tail 
distribution is denoted by 

Fx(x) = 1 - F x (x). 

which may be more convenient to use than the c.d.f. sometimes. We denote the upper end point 
of Fx by 

u (F x ) = sup {x : F x (x) < 1} . (1) 

For i.i.d. random variables Xi,X- 2 , ■ ■ ■ ,X n , we define Xj :n as the j-th order statistic, i.e., the j-th 
smallest of the n random variables. 

2.2 System Model 

Consider a job consisting of n parallel tasks, where n is large. Analysis of real-world trace data shows 
that it is common for a job to contain hundreds or even thousands of tasks [l6 . We assume the 
executing time of each task on a computing node is independent and identically distributed (i.i.d.) 
according to Fx, where Fx is the cumulative distribution function (CDF) of random variable X. 

The distribution Fx accounts for the variability in the machine response due to various factors 
such as congestion, queuing, virtualization etc. We consider that there is an unlimited pool of 
machines such that each new task (or new replica) is assigned to a new machine. Hence, the 
execution time of a task can be assumed to be i.i.d. across machines. 

Since we assume an unlimited pool of machines, the delay due to queueing of tasks at machines 
is small and can be subsumed as a constant additive term in the execution time X. Also, we do 
not consider the dependence of the task execution time on the size of the task itself. But it can 
be accounted for similarly by adding a constant delay (fixed across tasks of the same job) to the 
expected completion time of the job defined in Section |2.4[ 

2.3 Scheduling Policy 

A scheduling policy or scheduler assigns tasks to different machines, possibly at different time 
instants. We assume that the scheduler receives instantaneous feedback notifying it when a machine 
finishes its assigned task. But there is no intermediate feedback indicating the status of processing 
of a task. When the scheduler receives notification that at least one replica of each of the n tasks 
has finished, it kills all the residual running replicas. It can also use the feedback to decide when 
to selectively launch or kill replicas in order to reduce the overall job completion time. 

The times when the scheduler launches or kills replicas could be pre-determined (static policy) 
or be dependent on the feedback about the execution of other tasks in the job (dynamic policy). 
We focus our attention on a set of dynamic policies called single-fork policies, defined as follows. 
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Figure 1: Illustration of single-fork policies with and without relaunching. 


Definition 1 (Single-fork Scheduling policy). A single-fork scheduling policy irsp (p,r,l) launches 
a single replica of all n tasks at time 0. It waits until (1 — p)n tasks finish and then for each of the 
pn straggling tasks, chooses one of the following two actions: 

• replicate without relaunching (l = 0): launch r new replicas; 

• replicate with relaunching (1 = 1 )'■ kill the original copy and launch r + 1 new replicas. 

When the earliest replica of a task is executed, all the other replicas are terminated. 

We use l to denote the number of original replicas of each task remaining after the forking point. 
Hence l = 0 when the original replica is killed and restarted, and l = 1 otherwise. Note that 
in for both the relaunching (l = 0) and no relaunching (l = 1) cases there are a total of r + 1 
replicas running after the forking point. The effect of r and l on the replication of straggling tasks 
is illustrated in Fig. [l] 

For simplicity of notation we assume that p is such that pn is an integer. We note that p = 0 
corresponds to running n tasks in parallel and waiting for all to finish, which is the baseline case 
without any replication or relaunching. 

Remark 1 (Backup tasks in MapReduce). The idea of ‘backup ’ tasks used in Google’s MapRe¬ 
duce is a special case of the single-fork policy. Following our notation, it corresponds to r = 1 
and 1 = 1. The value of p is tuned dynamically and hence not specified in |5j/. 

Although we focus on single-fork policies in this paper, the analysis can be generalized to multi-fork 
policies, where new replicas of straggling tasks are launched at multiple times during the execution 
of the job. Forking multiple times can give a better latency-cost trade-off, but could be undesirable 
in practice due to additional delay and complexity in obtaining new and killing existing replicas. 


2.4 Performance metrics 


Our objective is to find the best single-fork policy 7Tsf (p, r, l ) for a given task execution time 
distribution F\- We now define the performance metrics of latency and resource usage that are 
used to evaluate different scheduling policies. 


Definition 2 (Expected Latency). The expected latency E [T] is the expected value of T, the 
time taken for at least one replica of each of the n tasks to finish. It can be expressed as, 


E [T] = E 


max Tj , 


(2) 


where Tj is the minimum of the finish times of the replicas of task i, which depends on the scheduling 
policy used. 
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Figure 2: Illustration of T and C for a job with two tasks, and two replicas of each task. The 
latency T = max( 8 ,10) = 10, and the computing cost is C = (8 + 6 + 10 + 5)/2 = 14.5. 


Definition 3 (Expected Cost). The expected computing cost E [C] is the sum of the running 
times of all machines, normalized by n, the number of tasks in the job. The running time is the 
time from when the task is launched on a machine, until it finishes, or is killed by the scheduler. 


For a user of a cloud computing service such as the Amazon Web Service (AWS), which charges 
the user by time and number of machines used, the money paid by the user to rent the machines 
is proportional to E [C]. 

Example 1. Suppose the scheduler launches r replicas of each of the n tasks at times t t .j for 
j = 1, 2,... r. Then the latency T t of the i th task is given by 


Ti = min (tij + Ajj), (3) 

l<j<r 

where X t j are i.i.d. draws from the execution time distribution Fx■ The computing cost C can be 
expressed as 




1 

n 


n r 


EEb 



( 4 ) 


where \x\ + = max( 0 , x). 

Fig. [2] illustrates the execution of a job with two tasks, and evaluation of the corresponding latency 
T and cost C. Given two tasks, we launch two replicas of task 1 tu = 0 and t \ t 2 = 2, and two 
replicas of task 2 at £ 2,1 = 0 and £ 2,2 = 5. The task execution times are A 1.1 = 8 , Ai 2 = 7, 
A 2 , 1 = 11, and A 2,2 = 5. Machine M\ finishes the task first at time t = 8, Ti = 8 and the second 
replica running on Mo is terminated before it finishes executing. Similarly, machine M 4 finishes 
task 2 at time T 2 = 10, and the replica running on M 3 is terminated. Thus the latency of the 
job is T = max{Ti,T 2 } = 10. The cost is the sum of all running times normalized by n, i.e., 
C = (8 + 6 + 10 + 5)/2 = 14.5. 

In this work we analyze the trade-off between E [T] and E [C\ for the single-fork policy and 
provide insights into design of scheduling policies that achieve a good trade-off. 


3 Single-fork policy analysis 

In this section we evaluate the performance metrics E [T] and E \C] for a given single fork policy 
^SF (p,i",l). The main insight we get from this analysis is that the tail behavior (heavy, light or 
exponential) of Fx is the key factor in characterizing the latency-cost trade-off. We demonstrate 
this for two canonical distributions: Pareto and Shifted Exponential. All proofs are deferred to 
Section [5] 
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3.1 Performance characterization 


For a job with a large number of tasks n, the expected latency and cost can be expressed in terms 
of the single-fork policy parameters p, r and l as given by Theorem [I] below. 

Theorem 1 (Single-Fork Latency and Cost). For a computing job with n tasks, and task ex¬ 
ecution time distribution Fx, the latency and cost metrics as n —> oo are 

E [T] = F^ 1 (l -p) + E \Ypn._pn] , (5) 

E [C]= [ P F- 1 (h)dh + pF-\l-p) + (r + l)p-E[Y], (6) 

Jo 

where Fy is given by Lemma [£] below, and it is the distribution of the residual time after forking 
when the earliest replica of a straggling task finishes. The term E [Y pn:pn \ is the expected maximum 
of pn i.i.d. random variables drawn from Fy. Its behavior for n —>• oo is given by Lemma [3] below. 

Lemma 2 (Residual Straggler Execution Time). As n —>• oo, the tail distribution Fy of the 
residual execution time (after the forking point) of each of the pn straggling tasks is 


Fy (y) 


Fx(y) r+1 if 1 = 0, 

lF x (y) r F x (y + Fx\l-p)) if 1 = 1. 


(7) 


Lemma 3. As n -» oo, the asymptotic behavior of E [Y pn:pn ] is given by 
1. If Fy G DA (A), 


E [f pn:pn] — T 


2. If Fy 6 DA(<f> (r+1)? ), 

E [ Y pn:pn ] = dpnT (1 - l/[(r + 1 )£]) , 

3. If Fy € DA (T(( 1 _; )r+1 ) ? ), 

E [Y pn:pn ] = tv ( Fy ) — dpnT (1 + !/[((! — l)r + 1 )^]) , 


where DA(-) is the domain of attraction of Fy which can be determined using Lemma [9| and 


7em is the Euler-Mascheroni constant, and T(-) is the Gamma function, i.e., 


Theorem m 


The terms a pn and b pn are the normalizing constants of Fy as given by Theorem 
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POO 

r(t) = / x t ~ 1 e~ x dx. 
Jo 


( 8 ) 


The domain of attraction of a distribution depends on its tail behavior (exponential, heavy or light). 
For example, exponentially decaying distributions belong to DA (A) while heavy tailed distributions 
belong to DA (d>^). 

We now give a sketch of the proof of Theorem [l] A detailed proof can be found in Section [5} 
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Proof sketch of Theorem [7] . The expected latency of a single fork policy vtsf (p, r, l; n) can be 
decomposed into two parts: 


E [T] = E 


yf 1 ) 


+ E 


y( 2 ) 


(9) 


where, T h) is the time to execute the first (1 — p)n tasks and is the time to execute the rest 
of the pn tasks with replication. We evaluate each of these parts separately. 

It is straightforward to see that T l 1 ) is the ((1 —p)n) th order statistic of n i.i.d. random variables 
with distribution Fx- Thus its expected value is 


E 


yf 1 ) 


^ [-^"(1—p)n:n] j 

~ — p) for large n, 


( 10 ) 

( 11 ) 


\th 


where © follows from the Central Value Theorem (Theorem|l0|) which states that the ((1 — p)n 
order statistic concentrates sharply around — p) as n —> oo. 

After n(l — p) tasks finish, the scheduler adds redundancy by launching replicas of the straggling 
tasks and waits for one replica to finish. We denote the residual execution time distribution of the 
straggling tasks by Fy■ It depends of Fx and the parameters r, p and l of the scheduling policy 
as given by Lemma [ 2 ] For example, for r = 2 and l = 0, the tail of distribution Fy = F\. which is 
the minimum of two i.i.d. random variables with distribution Fx- The proof of Lemma [2] is given 
in Section [5j 

The second part of the latency, TO is the maximum of the times until each of the pn straggling 
tasks finish. Thus, its expected value E [T^] = E[lj, n:pn ]. The behavior of the maximum order 
statistic of a large number of random variables is given bythe Extreme Value Theorem Theorem 11 
We can use it to show that E [T^ 2 )] is given by Lemma 3 

Thus we can evaluate the expected latency E [T] = E TW] + E [T (2 ']. Similarly, the expected 
cost E [C] can be evaluated by decomposing it into two parts: before and after the replication of 
straggling tasks. The details can be found in the proof in Section [5j □ 


3.2 Examples of the Effect of Tail Behavior 

We now demonstrate how the the tail of the distribution Fx is a major factor in determining the 
trade-off between E [T] and E [C\, and hence the choice of the best single-fork policy. We consider 
two canonical execution time distributions, the Pareto distribution (heavy tailed) and the Shifted 
Exponential distribution (exponential tail) and evaluate the latency-cost trade-off in Theorem [I] for 
them. One key insight from this analysis is that in certain regimes, it is possible to reduce latency 
while simultaneously reducing cost. 


3.2.1 Pareto execution time 

The cumulative distribution function of the Pareto distribution Pareto (cr, x m ) is 


F(x;a,x m ) = 



x > x m , 


( 12 ) 


Pareto distribution is a heavy-tail distribution, with a polynomially decaying tail. It has been 
observed to fit task execution time distributions in data centers 


11,16 
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— • — r = 1 & relaunch - « - r = 1 no relaunch 
—•— r = 2 & relaunch - « - r = 2 no relaunch 

p = 0.1 p = 0.2 



Figure 3: Comparison of the expected latency E [T] obtained from simulation (points) and analytical 
calculations (lines) for the Pareto distribution Pareto (2, 2). 


Theorem 4. For a computing job with n tasks, if the execution time distribution of each task is 
Pareto (a, x m ), then as n —>• oo, the latency and cost metrics are 

E [T} = x m p~ 1/a + T (l - - — a pn , (13) 

V (r + l)a ) 

E [( C] = x m a - x m - -— + (r + l)pE [Y]. (14) 

a — 1 a — 1 

The values of a pn and E [Y] depend on the relaunching parameter l, and are given as follows. 

Case 1: Relaunching (1 = 0) 


d p n — (pn) ( r + 1 ) Q! x m , (f^) 

(r + llcr 

= (16) 

Case 2: No Relaunching (1 = 1) 

The term a pn is the solution to 

n 1/a x r n t 1 = Xmp~ 1/a d r pn + a r +\ (17) 

and E [Y] is evaluated numerically as discussed in the proof. 


The proof is given in Section [5] From this theorem we can infer that the latency E [T] grows 
polynomially o(n 1 /“ (r+1) ) with n. This can be seen directly from (15) for the case of relaunching 
(1 = 0). For the no relaunching case we know that d pn grows with n. Thus for large enough n, it 
should be greater than x m p~ l / a . Hence from (17), 


n 1/a x r n ( 1 < 2 5J+ 1 . 


(18) 


The o(n 1/,a(r+1) ) growth follows from this. 

Fig. | compares the latency obtained from simulation and analytical calculations for Pareto 
distribution, indicating latency obtained from analytical calculation is very close to the actual 
performance for n > 100, especially for the case with relaunching (l = 0). In Fig. [4] we plot the 
expected latency and cost as p varies, for different values of r and l. The black dot is the baseline 














• baseline r = 0 & relaunch 

- r = 1 & relaunch . r = 1 no relaunch 

- r = 2 & relaunch - r = 2 no relaunch 



Figure 4: Expected latency and cloud user cost for a Pareto execution time distribution Pareto (2, 2), 
given n = 400. 


E[T] 



Figure 5: Expected latency E [T] versus the expected cost E [C] for Pareto (2, 2) and n = 400, by 
varying p along each curve in the range of [0,1]. For small p, we can reduce both latency and cost 
simultaneously. 


case (p = 0 ), where no replication is used and we simply wait for the original copies of all n tasks 
to finish. The baseline case is also equivalent to the policies with r = 0, l = 1 and any p. 

In Fig. [4] we observe that a small amount of replication (small p and r) can reduce latency 
significantly in comparison with the baseline case. But as p increases further, the latency may 
increase (as observed for r = 0) because of the second term in ([ 2 ]). For a given r, relaunching leads 
to lower latency when p satisfies the condition in Lemma [5] below. 


Lemma 5. For given r, relaunching (1 = 0) gives lower latency E [T] than no relaunching (l 
when p satisfies, 


p 1,a + {np)- 1/{r+1)a < 1, 


where a is the shape parameter of the Pareto distribution, as given in (12). 


= v 

(19) 


Intuition suggests that replicating earlier (larger p) and more (higher r) will increase the cost E [C]. 
But Fig. [4] shows that this is not necessarily true. Since we kill all the machines running a task 
when one of its replicas finish, there is in fact a saving in the computing cost! However this benefit 
diminishes as p and r increase above a certain threshold. 
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Figure 6: Comparison of the expected latency E [T] obtained from simulation (points) and analytical 
calculations (lines) for the Shifted Exponential distribution SExp (1,1). 


Fig- @ shows the latency versus the computing cost for different values of r and l, with p 
varying along each curve. Depending upon the latency requirement and limit on the cost, one can 
choose an appropriate operating point on this trade-off. This plot demonstrates the non-intuitive 
phenomenon that it is possible to reduce latency (from 70 to about 15 for r = 1 and r = 2 cases) 
with negative cost! In Lemma [6] we identify the values of p for the relaunching (l = 0) case when 
the corresponding single-fork policy is sub-optimal in both E [T] and E \C]. 

Lemma 6. For a given r and relaunching (l = 0), the range of p for which the single-fork policy 
is sub-optimal is given by 


r l- 


(r + 


n 1 /( r + 1 ) a _ f r + 

-l)aj ) 

( (r + l) 2 ct p~ l / a 


(r + l)ct — 1 


a 


= 0 , 


( 20 ) 


where a policy 7rgp (p, r, l) is said to be sub-optimal if there exists another policy 7tsf (jJ, r, l) which 
gives lower E [T] and E \C\ than 7Tgp (p, r, l). 


For example for the r = 1 and relaunch (l = 0) case, we can solve (20) to show that all policies 
with p < p\ ~ 0.05 are sub-optimal, where p\ is marked in Fig. [5j Similarly, for cases r = 0 and 
r = 2, the sub-optimal ranges [0,Pq\ and [0,^] are shown respectively in Fig. [5] 

We conjecture that the convex hull of the curves for different r and l gives the optimal latency- 
cost trade-off. Points on the hull, which lie between some two curves can be achieved by time-sharing 
between the corresponding two policies. 


3.2.2 Shifted Exponential execution time 

We now analyze the latency-cost trade-off when the task execution time distribution Fx is a Shifted 
Exponential Distribution SExp (A, A). Unlike the Pareto distribution which is heavy-tailed, shifted 
exponential has an exponentially decaying tail. Its cumulative distribution function is given by 


F(x) 


1 — e X ( x for x > A, 
0 otherwise. 


( 21 ) 
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Figure 7: Expected latency and cost for a Shifted Exponential execution time distribution 
SExp (1,1), given n = 400. 


The special case A = 0 corresponds to the pure exponential distribution, which is popular in 
queueing theory and scheduling due to its memoryless property. But it may not be suitable for 
modeling the execution time of a task, as a task seldom finishes instantaneously, and usually it is 
lower bounded by a constant delay due to machine start-up or task initialization. Hence we add 
the constant delay A and model the execution time using the SExp (A, A). 

Theorem 7. For a computing job with n tasks, if the execution time distribution of each task is 
SExp (A, X), then as n —>• oo, the latency and cost metrics are 


E[T] 
E [C\ 


2 r + l A 1 
r + l (r + l)A 


(Inn 


r In p + 7em) , 


A + i 
A + i 


+p 


A + r 



+ p(r + 2) A 


1 = 1 , 

l = 0 . 


( 22 ) 

(23) 


Similar to Fig. i Fig.@ compares the latency obtained from simulation and analytical calcu¬ 
lations for the Shifted Exponential distribution, which again demonstrates the effectiveness of the 
asymptotic theory. 

We can draw the following observations from Theorem [7j Given r and l, replicating earlier 
(larger p) gives an 0(lnp) decrease in latency, and a linear increase the cost. This is also illustrated 
in Fig. [T] for execution time distribution SExp (1,1) and n = 400. Fig. [8] illustrates the latency-cost 
trade-off. Unlike Fig. [5] there is no range of p for which both latency and cost decrease (or increase) 
simultaneously. 

For the special case of A = 0 by Theorem [tJ the cost E [C\ = ^, which is independent of p 
and r. But latency always reduces with r and p. This suggests that we can achieve arbitrarily low 
latency without any increase in cost. Since this is not observed in practice, we can conclude that 
the pure exponential distribution is not a useful model for the task execution time. 

For a given p and r, relaunching always gives larger latency than no relaunching. But the cost 
may increase or decrease depending on the values of A and A. Lemma [8] below gives the set of the 
parameters for which latency and cost are strictly larger with relaunching. 


Lemma 8. If A > /3*/ A, where (5* > 0 is the solution to 

/3r + f3 — r + re= 0, 
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Figure 8: The trade-off between expected latency IE [T] and expected cost E [C\ for SExp (1,1) and 
n = 400, by varying p in the range of [0.05, 0.95]. 


Algorithm 1 Estimate Latency and Cost Metrics 
Find empirical CDF Fx(x) from the latency traces 
Find CDF Fy(y) using Lemma [2] 
for i = 1 , 2 , ... m do 

Draw n samples from Fx 

the n(l — p) th smallest sample 
<— the mean of the smallest n(l — p) samples 

Draw np samples from Fy 
(i) 

Yrhax t— the maximum of the samples 
Yavg <— the mean of the samples 

rW <- T? + Y^ ax 

<- cP +pT± l) + (r + 1 )y a ( 4 

end for 

T •<— mean of for * = 1 , 2 ,... m 
C •(— mean of C^) for i = 1 , 2 , ... m 


then relaunching leads to strictly larger latency and cloud computing cost than no relaunching. In 
particular, (3* < 1. hence if x m > 1/A, then no relaunching achieves better trade-off between latency 
and cost than relaunching. 

4 Heuristic Algorithm 

In certain practical systems may be difficult to fit a well-known distribution such as Pareto or 
shifted exponential to the empirical behavior of the task execution time. This makes the analysis 
of the latency-cost trade-off using the framework presented in Section [3] hard. In this section we 
present algorithms that use traces of task execution times to estimate the latency and cost metrics, 
and use these estimates to search for the best single-fork policy 7 Tsf ( p,r,l ). We also evaluate the 
algorithms on Google trace data [1|. 
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4.1 Estimation of Latency and Cost Metrics 

In Algorithm [I] we propose an algorithm to estimate E [T] and E [C\ based on random sampling of 
distributions F\ and Fy. The estimates are denoted by T and C respectively. 

We first construct empirical CDFs Fx and Fy from experimental traces of start and finish times 
of tasks on a large cluster of machines. The traces need to sufficient to be capture the tail behavior 
of the distribution, which plays a key role in characterizing the latency-cost trade-off, as seen in 
Section© To estimate the terms of E [T] and E [C\ in © and (© involving Fx , we draw n i.i.d. 
samples from Fx and find the maximum and mean of the smallest n( 1 — p) samples. Similarly, to 
estimate the terms in © and © involving Y, we draw pn samples from Fy and find the maximum 
and mean of the samples. The above steps are repeated m times and T and C are set to means of 
the corresponding estimates from each step. 

While it is possible to estimate T and C directly from traces of the execution time, we take the 
two-step approach of first finding the empirical CDFs Fx and Fy and then estimating T and C. 
This is because unlike samples of the task execution time X, the samples of the residual straggler 
execution time Y (Lemma© cannot be directly obtained from the traces of task execution time. 
Also, this two-step approach allows system designers to improve each of two estimates separately. 
For example, the CDFs F\ and Fy can be smoothed using bootstrapping methods [9|. 

By the Central Limit Theorem we can show that the standard deviation of the error in estimating 
E [C], and the first term in E [T] converges to zero as 0(1/ yjmn), where m is the number of times 
the sampling procedure is repeated. But in general, the maximum order statistic term in E [T] 
converges to zero as 0(l/^/m), which is slower. Thus, the estimate of C is more robust than that 
of f. 

4.2 Heuristic Search for the Best Policy 

We now present a heuristic algorithm that uses traces of execution time to search for the best 
single-fork policy. This policy can be used to perform task replication in future jobs that have 
similar task execution time statistics. The best single-fork policy is said to be the policy that 
minimizes the objective function J defined as, 

J = E [T] + pE [C ], 

~ T + p€. (24) 

The parameter p is the priority given to the minimizing the cost. Since we know only estimates T 
and C of the expected latency and cost from Algorithm © we use the estimated objective function 
in ( p4] ). 

Algorithm © gives the pseudo-code of the search for the best single-fork parameters p, r and l. 
For a given p , we first find the optimal r and l, and then perform gradient descent on p. This is 
repeated for k iterations. To optimize r and l we keep increasing r by 1 until the objective function 
decreases. For each r, we set l to the value (0 or 1) which gives a smaller J. The terms T(n) and 
C(7 r) are the latency and cost estimates for the policy n = 7rgF ( p , r, l ) found using Algorithm© 
The policy found by Algorithm © may not be the true optimal single-fork policy due to the 
following error factors: 

1. Error in the estimates of E [T] and E [C\ from Algorithm© 

2. The gradient descent in p could be slow and may not converge to the optimum in k iterations. 
Also, note that E [T] and E [C] in (© and (© are convex in r, but not in p and l. Thus, the 
algorithm is not guaranteed to converge to the optimal single-fork policy. 
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Algorithm 2 Heuristic to Find Best Single-Fork Policy 
Initialize p = 0, r* = 0, l* = 0 
for i = 1, 2,... k do 

% For given p. optimize l and r 
while 1 do 

vr <r- 7TSF ( p,r*,l*) 

K <- Tsf (p,r* + 1,0) 

< 4- 7TSF (p, r* + 1, 1) _ 

if T( tt[) + pC( tt[) < T(tt'q) + pC(n' 0 ) then 

7 t' 7T^ 

else 

_/ , _/ 

7T ^7I- 0 

end if 

Aj <- T(tt') - T(vr) + p,(C(ir') - C'(tt)) 

if Aj < 0 then 

7T i — 7T / 

else 

break 
end if 
end while 

% Gradient Descent on p 

ttsf ( P,r*,l*) P- 7T 

vr' 7TSF (p + A p ,r*,l) 

A j ^ T(tt') - T(vr) + /r(C(7r') - C'(tt)) 
p^p- ApAj 

end for 


3. The task execution time statistics of future jobs may not match the empirical CDF Fx that 
was used to find the best policy in Algorithm [2j 

4.3 Demonstration using Google Traces 

We now demonstrate the results of running Algorithm [2] on Google Trace data 11. We use the 
traces to estimate the distribution Fx and then run the heuristic algorithm on it to find the best 
single-fork policy. 

The Google Trace data gives timestamps of events such as SCHEDULE, EVICT, FINISH, 
FAIL, KILL etc. for each of the tasks of computing jobs that are run on Google’s machine clusters. 
We consider the difference between the SCHEDULE and FINISH timestamps as the task execu¬ 
tion time, and construct the empirical distribution Fx- Note that because we consider only the 
SCHEDULE and FINISH task, we are not accounting for the computing resources consumed due 
to failure or eviction of tasks. 

We consider two large Google cluster jobs with n = 1017 and n = 488 tasks respectively. Their 
normalized histograms are shown in Fig. [9] and Fig. [TTjj Fig. [9] shows heavy-tail behavior of task 
execution time, whereas Fig.[l0]has bimodal behavior with a very small percentage of tasks finishing 
in around 1400 seconds. 

Then we run Algorithm El on the empirical Fx found using the histograms. We use A p = 0.002, 
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Figure 9: Normalized histogram of the task execution times for a Google cluster job with n = 1017 
tasks. 



Figure 10: Normalized histogram of the task execution times for a Google cluster job with n = 488 
tasks. 


m = 500 and k = 25. The latency-cost trade-offs of the heuristic policies found by the algorithm 
are shown in Fig. 11 and Fig. [12J We also plot the estimated latency-cost trade-off for r = 1 and 
l = 1 as p varies from 0 to 1. These are the parameters of the back-up tasks option in MapReduce 
as described in Remark [lj 

We observe in Fig. [IT] that the heuristic algorithm finds policies with r > 1 that give lower 
latency for the same cost, than the policies with r = 1 and 1 = 1. We also observe that the latency 
reduction is more when p is smaller, that is, the priority given to minimizing the cost is lower. But 
in Fig. 12, the reduction in latency by adding additional replicas r > 1 is very small as compared 


to the r = 1,1 = 1 case because it has a lighter tail. In both Fig. 11 and Fig. 12 we observe that 
adding redundancy, that is r > 1 signficantly reduces latency for a small cost, in comparison with 
the baseline case (p = (1), which is also equivalent to r = 0, l = 1 with any p. 


5 Proofs of Single-fork Analysis 

In this section we give proofs of the results in Section [3] 
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Figure 11: The latency-cost trade-off of policies found by Algorithm [ 2 ] with p = 1, 2,3, for a Google 
cluster job with 1017 tasks. The r = 1 and l = 1 (parameters of back-up tasks in MapReduce) 
trade-off is also shown for comparison. 


5.1 Latency and Cost Analysis for general F- 


x 


Proof of Theorem [7} The expected latency E [T] can be divided into two parts: before and after 
replication. 


E [T] = E 


jd 1 ) 


+ E 


rjn(2) 


— ® [A(i - p )n:n] + ® 


max Yj 
y=l,2,...,pn 


= F x 1 (! - p) + E [Y pn:pn ] 


(25) 


The time before forking T^ 1 ) is the time until (1 — p)n of the n tasks launched at time 0 finish. 
Thus, its expected value E [T^ 1 ^] is the expectation of the (1 — p)n th order statistic Xn_ p ^ n:n of n 


i.i.d. random variables with distribution Fx- By the Central Value Theorem stated as Theorem 10 
for n —> 00, this term goes to inverse CDF value Fx (1 — p). At this forking point, the scheduler 
introduces replicas of the pn straggling tasks. The distribution Fy of the residual execution time 
(minimum over the r + 1 replicas) of each straggling task is given by Lemma [2j Thus the term 
E [T^] in (25) is the expected value of the maximum of pn i.i.d. random variables with distribution 
F y . 

Similarly, we can analyze the expected cost before and after forking. Recall from Definition [3] 
that the expected cost E [C] is the sum of the running times of all machines, normalized by the 
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Figure 12 : The latency-cost trade-off of policies found by Algorithm [ 2 ] with /i = 1, 2,3, for a Google 
cluster job with 488 tasks. The r = 1 and l = 1 (parameters of back-up tasks in MapReduce) 
trade-off is also shown for comparison. 


number of tasks n. 


E 


E [C] = E 

1 


C (i) 


E 


C (2) 


C {1) + E C (2) 

(1 ~P)n 

V E[i i: „] + ^E[rW 

i=1 

(1 ~p)n / . 

I 

n “ v n 


n 


1 


i —1 
"1 ~P 


Y, F x X - )+pFx\l-p), 


F x (h)dh + pF x (1 -p), 


pn 


= -^{r+ 1 )EK], 

3 = 1 

= (r + 1 )p ■ E [T] . 


(26) 

(27) 

(28) 

(29) 

(30) 

(31) 


The cost before forking E [C^ 1 - 1 ] consists of two parts: the cost for the (1 — p)n tasks that finish 
first, and the cost for the pn straggling tasks. The first term in (27) is the sum of the expected 
values of the smallest (1 — p)n execution times. Using Theorem 
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we can show that the i th term 

in the summation goes to Fx\i/ n) as n —> 00. Expressing the sum as an integral over h = i/n 
we get the first term in (29). The second term in ( |27[ ), is the normalized running time of the pn 
straggling tasks before forking. Substituting E [T^J from (25) and simplifying, we get (29). 

The cost after forking, E [C( 2 )] is the normalized sum of the runtimes of the r + 1 replicas of 
each of the pn straggling tasks. By Lemma [2J the residual execution time of the j th straggling 
task is Yj ~ Fy ■ Since the scheduler kills all replicas as soon as one replica finishes, the expected 
runtime for the j th straggling task is (r + 1)E [Yj]. Thus, the cost in (30) is the sum of (r + 1)E [Y)] 


over the pn tasks, normalized by n. Since Yj are i.i.d, we can reduce this to (31). 


□ 


Proof of Lemma^ First consider the case where we relaunch (l = 0) the original copy, and add r 
replicas for each of the pn straggling tasks. Thus, there are r + 1 identical replicas of each task 


17 




























after forking. The residual execution time distribution Fy (after time when the replicas are 
added) of each task is the minimum of r + 1 i.i.d. random variables with distribution Fx- Hence, 


Pr(y > y ) = Pr(min(Xi, X 2 , -- • X r+1 ) > y ), 
F y (y) = Fx ( y) r+1 for l = 0 . 


(32) 

(33) 


For the case without relaunching (Z = 1), there is 1 original replica and r new replicas of each of 
the straggling tasks. Thus, the tail distribution Fy (y) = 1 — Fy(y) is given by 


Pr(T >y)= Pr(Xi > y|Xi > T (1 )) 

• Pr(min(X 2 , ■ ■ • X r+1 ) > y), 

Fy (y) = Fx ( y) r ■ 


As the number of tasks n —> oo by Theorem 


Fx (T(H) 

we know that T ^ 


(34) 

(35) 


10 


5 x 1 (1 — p). Hence we have, 


f y (y) = (»)’ 


for l = 1 . 


P 


(36) 


To prove Lemma 3j we characterize the expected maximum of a large number of random vari¬ 
ables using Theorem mi First, we state Lemma [9] which implies that the domain of attraction (see 
Theorem 12 ) of Fy is same as that of Fx- 


Lemma 9 (Domain of attraction for Fy). Given a single fork policy 7 Tsf (p,r,l\n) with 0 < 
P< 1 , 

1. if Fx 6 DA (A), then Fy G DA (A); 

2- if F x € DA (<f> 5 ), then Fy G DA (4> (r+1)? ); 

3- if F x G DA (\k^) ; then Fy G DA (^ r (( 1 _/) T . +1 ^) . 

The proof follows from Lemma [2] and Theorem 12 and is omitted here. 


Proof of Lemma [3| We can use Lemma [9] to find the domain of attraction of Fy. Then from (55) 
we have 


IE [Y n :n\ = On® [G(y)] + b n 


where E [G(y)] can be found using Theorem 11 and Lemma 13 


□ 


5.2 Latency and Cost Analysis for Pareto F 


x 


We prove Theorem |4j which evaluates the latency E [T] and computing cost E [C] metrics when the 
task execution time distribution Fx is the Pareto, as defined in (12). 


Proof of Theorem [^J From Theorem [T] we have 

E[T]=F^(1- 

= x m p~ 1/a 

= Xmp- 1 /* 


p) + E Pjwijpri] i 


+ flpnE [$( r -|_i) a ] , 

(37) 

+ flpr,r ( 1 1 (r + l)a) 

(38) 
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( 39 ) 


E[C] = [ P F^ 1 {h)dh + pFx 1 (l-p) + {r + l)p-E[Y}, 

J o 

= x m [ (1 - h)~ l/a dh + px m p~ 1/a + (r + l)p ■ E [Y] , 

Jo 

= x m ~~~r [1 -p^ 1/a ] +ic m p 1_1/a + (r + 1 )p-E[Y] , 
a — 1 


= X r 


a p 


a — 1 


1-1 /a 

a — 1 


+ (r + l)p • E [Y]. 


(40) 


To obtain (37) we first observe that since Fx is Pareto, by Theorem 12 it falls into the Frechet do¬ 
main of attraction, i.e. Fx £ DA ($ a ). Then using Lem ma [9| we can show that Fy £ DA (^fr+i)a)- 
Subsequently, using Theorem [IT] and Lemma 13 we get (38). To derive the expected cost po] ) we 
substitute (h) = x m (l — h) _1 ^ a in the first and second terms in (39) and simplify the expression. 

To find a vri and E [Y - ] in ( |38| ) and (40) respectively we consider the cases of relaunching (l = 0) 
and no relaunching (/ = 1 ) separately. 

Case 1: Relaunching (/ = 0) 

In the single-fork policy with relaunching (l = 0), the scheduler waits for (1 — p)n tasks to finish 
and then relaunches each of the pn straggler tasks on a new machine. 


Y = min(Xi,X 2 ,... X r+i ), 

Y ~ Pareto ((r + l)a, x m ). 


From (59) in Theorem 11 we can evaluate a pn as follows 


O-pn — Fy ( 1 

\ pn 
= x m (pn) 1/a . 


And E [Y] of (41) can be evaluated as 


E l y l= r ( Iu 1 )a i x "- 

(r + l)a — 1 


(41) 


(42) 


Case 2: No Relaunching (/ = 1) 

In the single-fork policy with no relaunching (l = 1), the scheduler keeps the original copy, and 
adds r additional replicas for each straggling task. Using Lemma [2] we can show that 


p\y 


y + x m p V" 


From (59) in Theorem 


11 


! p r l Fy 


1 (pk)’ w hi c h simplifies to 
(pn) 1 /" = { 1 + 


a. 

x m p 


pn 

— 1/a. 


Uypn 


(43) 


which simplifies to ©• 
The expected value of 
port. 


Y can be found by numerically integrating Fy(y ) in (43) over its sup- 


□ 
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Proof of Lemma^ For given r, the expected latency is lower with relaunching when p satisfies, 

E [T] (l=1) > E [T] {1=0) , 


d {l=1) > a (z=0) 

^pn — ^pn ? 


n 1 / a x^ hl > x m p 1//a (pn)( r+1 )“x^ + (p7i) <* x 7 ^ 1 , 
1 > {pn)~ l ^ r+l)a +p l/a . 


© 


Proof of Lemma^ Given r and relaunching (l = 0), the single-fork policy is sub-optimal in both 
E [T] and E [C\ if p satisfies 


dE [T ] dE [C\ 


> 0 . 


dp dp 

Substituting E [T] and E [C\ from Theorem [4] and simplifying, we get (20). 


□ 


5.3 Latency and Cost Analysis for Shifted Exponential F; 


x 


Now we prove Theorem [7J which gives the latency-cost trade-off when the distribution of the 
execution time A is a shifted exponential given by (21). 


Proof of Theorem [?| 


E [T] — F x (1 — p) + E , 

— A — \np + o,p n E [A] + b pn , 

= A - J hip + dpn'jEM + bpn . 


(44) 

(45) 


E [C\ = f 1 P F- 1 (h)dh + pF-\l-p) + (r + l)p-E[Y], 
■Jo 

= J ^A - i ln(l - h)^j dh + p(^ A - ^ lnpj , 

+ (r + l)p • E [T], 

1 P 

= A + - (plnp + (1 - p)) +pA - - In p, 

A A 

+ (r + l)p • E [y], 

= A(1 + p) + P + (r + l)p ■ E [y]. 

A 


(46) 

(47) 

(48) 

(49) 


To find E [y], d pn and b pn we consider the cases of relaunching (l = 0) and no relaunching (l = 1) 
separately. 

Case 1: Relaunching (Z = 0) 


y = min{X 1 ,A 2 , ■••A r+1 } 
~ SExp (A, (?’ + 1)A) 

e [y] = a + 1 


(r + 1)A 


(50) 

(51) 

(52) 
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Based on Theorem 12 for rj(y) = l/((r + 1)A) we have 


Um F y (v_+wn(y)) = g _ u . 

y^u{F Y ) Fy (y) 


( 53 ) 


By Theorem 1 11 1 and Theorem|12[ the maximum of shifted exponential belongs to the Gumbel family 
with 


1 


A(1 + r) ’ 


b pn = F Y 1 (1/n) = A + 


In (pri) 
A (r + 1) 


Case 2: No Relaunching (Z = 1) 

In the case of no relaunching, 


y = min {Exp (A), A + Exp (rA)} . 

Note that the first term does not include A because for large n the original task would have run 
for at least A seconds. Thus the tail distribution of Y is given by 


Fy (y) = 




o < y < A, 


e ArA e -A(r+l )y y > A. 


(54) 


The expected value E [y] is the integration of Fy ( y ) over its support. 


~ Xy dy+ [ 
J A 


E [Y] = / e 

Jo Ja 

1 - e -AA e _AA 

+ 


e ArA e _A ^ r+1 ^ 


A 


A (?’ + 1)' 


By Theorem 11 and Theorem 12 similar to the relaunching case we have 

d pn = 1/ [A(l + r)] , 

V, = Fy 1 (1/n) — —brA + 


r + 1 A (r + 1) 


□ 


Proof of Lemma [S| For the shifted exponential distribution it is clear that no relaunching always 
gives a lower latency. We now find conditions on A and A for which no relaunching gives lower 
cost. Define /3 = AA, then 




AE [C(l = 1)] > AE [C(l = 0)] 


n + pn 


AA + r (l-e _AA 


> n + pnX(r + 2)A, 


/3r + /3 — r + re 13 < 0. 


And note that the function g(/3) = (3r + f3 — r + re ^ is monotonically increasing since g'(P) > 0 
for any r E Z + . □ 
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6 Concluding Remarks 

6.1 Major Implications 

In this paper we show that replication of the slowest tasks in a job (the stragglers) is a powerful way 
to reduce latency in large-scale parallel computing. We characterize the trade-off between latency 
and computing cost for a set of replication strategies called single-fork policies. The policy used 
in practical systems such as MapReduce belongs to this set of single-fork policies. A non-intuitive 
insight we get from this analysis is that in certain scenarios it is possible to reduce latency and cost 
simultaneously. We also propose a heuristic algorithm to find the scheduling policy which achieves 
a good trade-off between latency and cost. Experiments on Google Traces data show that policies 
found by the heuristic algorithm can give a better latency-cost trade-off than the back-up tasks 
option used in MapReduce. 

6.2 Future Perspectives 

Although we focus on single-fork policies in this work, the analysis can be extended to multi-fork 
policies where the scheduler can add or kill replicas at multiple instances during the execution. We 
also plan to consider queueing of tasks at machines and analyze the percentile latency in addition 
to the expected value considered here. Another future direction is to analyze the performance of 
approximate computing where only a subset of the tasks of a job are required to be completed, for 
example information retrieval and machine learning applications. 

In this work, we assume prior knowledge of the execution time distribution Fx when designing 
the task replication strategy. An interesting research direction is to develop an adaptive schedul¬ 
ing algorithm that simultaneously estimates the distribution and schedules task replication. This 
shares some similarity to the celebrated multi-arm bandit problem, with an exploration-exploitation 
trade-off between estimating the distribution Fx- and using the current estimate to design a task 
replication policy. 

Our analytical framework can be used in other applications where the response time of the 
components is stochastic. For example, in crowdsourcing, each worker can take a variable amount 
of time to complete a task. Then the overall latency can be modeled in the same way as our work, 
with the cost function being equal to the number of workers [201. 
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In this section we introduce some definitions and results in order statistics that play an important 
role in the analysis of scheduling policies in Section[3] In particular, we describe how order statistics 
in different regimes have different concentration behavior in Appendices [A] and [B] 


A Central order statistics 


For an order statistic Xk :n , we called it a central order statistic if k ~ np for some p £ (0,1). In 
this case, X^. n is asymptotically normal, concentrated around the p-th quantile of A, as indicated 
by the following result called the Central Value Theorem (Theorem 10.3 in | 6 |). 

Theorem 10 (Central Value Theorem). Given X\, X 2 ,..., X n F, if 0 < p < 1 and 0 < 

f(x p ) < 00, where x p = F~ 1 ( p ), then for k = k(n) such that k = np + o 


Xk:n N 



P(1 -P) \ 
nf 2 (x p )) 


where /(•) is the p.d.f. corresponds to F and —> denotes convergence in probability as n —> 00. 


B Extreme order statistics 

Extreme value theory (EVT) is an asymptotic theory of extremes, i.e., minima and maxima. It 
shows that if a distribution belongs to one of three families of distributions Theorem 12), then its 
maxima can be well characterized asymptotically as given by Theorem 11, which is also referred to 
as the Fisher-Tippett-Gnedenko Theorem (Theorem 1.1.3 in [7]). 

i.i.d. 


Theorem 11 (Extreme Value Theorem). Given X\, 
of constants a n > 0 and b n £ M such that 


■ x n 


F, if there exist sequences 


P [fX n , n - b n )/a n < x\ -> G(x) (55) 

as n —> 00 and G(-) is a non-degenerate distribution. The extreme value distribution G(x) and the 
values of a n and b n depend on the domain of attraction (and hence the tail behavior) of F\ given 


by Theorem 12 


1. For Fx £ DA (A), 

a n = V (F _1 (l - 1/n)) , 
b n = F-\l-l/n) 

G(x) = A(x) = exp {— exp (—x)} 
where A(x) is called the Gumbel distribution. 

2. For F x £ DA(4> € ), 

a n = - 1/n), 



0 x < 0 

exp {— x _ £} x > 0 


(56) 

(57) 

(58) 


(59) 

(60) 

(61) 
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where ^(x) is called the Frechet distribution. 

3. For F x G DA(^ ? ) ; 

a n = u ( F ) - - 1 /n), 

b n = u (F ), 

{-(-x)^} x < 0 , 


G(x) = ^(x) = 


exp 

1 


x > 0 . 


(62) 

(63) 

(64) 


where 'I'c(x) is called the reversed- Weibull distribution. 

Theorem 12 (Domains of attraction). A distribution function Fx has one of the following 
domains of attraction if it satisfies the conditions of the extreme value distribution G(x) if and only 

if 

1. Fx G DA (A) if and only if there exists g(x) > 0 such that 


lim 

x— 


F(x + tr](x )) _ t 

& i 


F(x) 


2. Fx G DA (3>|) if and only if u ( F ) = oo and 

F(tx) 


t > 0; 


lim “+TT = t 

z->oo p (x) 

3. Fx G DA (4'^) if and only if oj (F) < oo and 

, F(u ( F) — tx) c 

hm ——--- =F, t > 0; 

x-s>o+ F(uj (F) — x) 

where u (x) = sup{x : Fx(x) < 1 }, the upper end point of the distribution Fx- 

Intuitively, F G DA (A) corresponds to the case that F has an exponentially decaying tail, 
F G DA (<hg) corresponds to the case that F has heavy tail (such as polynomially decaying), and 
F G DA (4'^) corresponds to the case that F has a short tail with finite upper bound. 

Lemma 13 (Expected Extreme Values). 

E [A] = 7 em, 

r , fr (i —1/£) f > i 

l+oo otherwise, 

E[^] = -r (i + i/0, 

where 7 em is the Euler-Mascheroni constant andT(-) is the Gamma function, i.e., 

noo 

r(t) = / x t ^ 1 e~ x dx. 

Jo 

We can also characterize the limit distribution of the sample extreme X\ :n analogously via 
X\, n = min {X\, .. .,X n } = - max {-Ad, ..., -X n } . 


Theorem 12 by 


It is worth noting that the distribution function for —X may be in a different domain of attraction 
from that of X. 
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